🏠

Chapter appendix-c: Python-Specific Notes

Understanding Python's Recursion Limit

Understanding Python's Recursion Limit

Python imposes a maximum recursion depth to prevent stack overflow crashes. This limit exists because every recursive call consumes stack space, and unbounded recursion would eventually exhaust available memory, crashing the interpreter.

The Default Limit: 1000 Calls

Python's default recursion limit is 1000 function calls. This means your recursive function can nest at most 1000 levels deep before Python raises a RecursionError.

Let's see this limit in action with a simple countdown function:

import sys

def countdown(n):
    """Count down from n to 0 recursively."""
    if n <= 0:
        return "Done!"
    print(f"Counting: {n}")
    return countdown(n - 1)

# Check the current recursion limit
print(f"Current recursion limit: {sys.getrecursionlimit()}")

# This will work fine
result = countdown(10)
print(result)

Output:

Current recursion limit: 1000
Counting: 10
Counting: 9
Counting: 8
Counting: 7
Counting: 6
Counting: 5
Counting: 4
Counting: 3
Counting: 2
Counting: 1
Done!

Now let's exceed the limit:

# This will crash
try:
    result = countdown(1500)
except RecursionError as e:
    print(f"RecursionError: {e}")

Python Output:

Counting: 1500
Counting: 1499
Counting: 1498
...
Counting: 503
Counting: 502
Counting: 501
Traceback (most recent call last):
  File "example.py", line 15, in <module>
    result = countdown(1500)
  File "example.py", line 8, in countdown
    return countdown(n - 1)
  File "example.py", line 8, in countdown
    return countdown(n - 1)
  File "example.py", line 8, in countdown
    return countdown(n - 1)
  [Previous line repeated 995 more times]
  File "example.py", line 8, in countdown
    return countdown(n - 1)
RecursionError: maximum recursion depth exceeded

Diagnostic Analysis: Understanding the Failure

The complete execution trace:

The function successfully counted from 1500 down to approximately 501, then crashed. The traceback shows the recursive call chain at the point of failure.

Let's parse this section by section:

  1. The error message: RecursionError: maximum recursion depth exceeded
  2. What this tells us: Python detected that the call stack depth reached its configured limit (1000 by default)

  3. The call stack depth:

  4. Current depth: ~1000 (the limit)
  5. Expected depth: 1500 (to count from 1500 to 0)
  6. Key observation: We needed 1500 stack frames but Python only allows 1000

  7. The repeated line notation: [Previous line repeated 995 more times]

  8. What this tells us: Python truncates the traceback display for readability
  9. Critical insight: The actual call stack had ~1000 frames, but Python only shows a sample

  10. The recursive call pattern:

  11. Expected: countdown(1500) → countdown(1499) → ... → countdown(0) → return
  12. Actual: countdown(1500) → countdown(1499) → ... → countdown(501) → CRASH
  13. Key difference: The recursion depth exceeded Python's safety limit before reaching the base case

Root cause identified: The problem requires more recursive depth than Python's default limit allows.

Why the current approach can't solve this: With the default limit of 1000, any input greater than ~995 will fail (accounting for some overhead from other function calls in the program).

What we need: Either increase the recursion limit, or redesign the algorithm to avoid deep recursion.

Why 1000?

The limit of 1000 is a conservative default that balances safety and practicality:

Most well-designed recursive algorithms don't need more than a few hundred levels of depth. If you're hitting the limit, it's often a sign that:

  1. Your algorithm has a bug (infinite recursion)
  2. Your problem size is too large for recursive solution
  3. You need to optimize your approach (memoization, tail recursion conversion)
  4. You legitimately need to increase the limit (rare)

When and How to Increase the Recursion Limit

When and How to Increase the Recursion Limit

The Tool: sys.setrecursionlimit()

Python provides sys.setrecursionlimit(limit) to modify the maximum recursion depth. This is a global setting that affects all recursive functions in your program.

import sys

# Check current limit
print(f"Default limit: {sys.getrecursionlimit()}")

# Increase the limit
sys.setrecursionlimit(2000)
print(f"New limit: {sys.getrecursionlimit()}")

# Now our countdown can go deeper
def countdown(n):
    if n <= 0:
        return "Done!"
    return countdown(n - 1)

# This now works
result = countdown(1500)
print(result)  # Output: Done!

Output:

Default limit: 1000
New limit: 2000
Done!

When to Increase the Limit

Increase the recursion limit only when:

1. You Have a Legitimate Deep Recursion Need

Valid scenario: Processing deeply nested data structures where depth is data-dependent.

import sys
import json

def count_nested_keys(obj, depth=0):
    """Count total keys in a deeply nested JSON structure."""
    if not isinstance(obj, dict):
        return 0

    count = len(obj)
    for value in obj.values():
        if isinstance(value, dict):
            count += count_nested_keys(value, depth + 1)
        elif isinstance(value, list):
            for item in value:
                count += count_nested_keys(item, depth + 1)
    return count

# Deeply nested configuration file (1200 levels deep)
# This is real-world data, not a contrived example
sys.setrecursionlimit(1500)  # Set higher than expected depth

with open('deeply_nested_config.json') as f:
    config = json.load(f)
    total_keys = count_nested_keys(config)
    print(f"Total configuration keys: {total_keys}")

2. You've Verified the Algorithm is Correct

Before increasing the limit, ensure your recursion actually terminates:

def factorial(n):
    """Factorial with validation."""
    if n < 0:
        raise ValueError("Factorial undefined for negative numbers")
    if n == 0 or n == 1:
        return 1
    return n * factorial(n - 1)

# Test with small input first
assert factorial(5) == 120
assert factorial(10) == 3628800

# Now safe to use with larger inputs
sys.setrecursionlimit(2000)
result = factorial(1500)  # This will work (though the number is huge)

3. You've Considered and Rejected Alternatives

Decision framework:

Consideration Increase Limit Use Alternative
Problem size is data-dependent and unpredictable
Recursion depth is bounded and known
Algorithm is naturally recursive (trees, graphs)
Simple linear recursion (countdown, factorial) ✓ Use iteration
Depth could be arbitrarily large ✓ Use iteration or stack
Performance is critical ✓ Consider memoization first

When NOT to Increase the Limit

1. Your Algorithm Has a Bug

Symptom: Recursion depth grows without bound.

def broken_countdown(n):
    """BROKEN: Missing base case check."""
    print(f"Counting: {n}")
    return broken_countdown(n - 1)  # Never stops!

# DON'T DO THIS:
# sys.setrecursionlimit(10000)  # This just delays the inevitable crash
# broken_countdown(5)  # Will still crash, just later

# FIX THE BUG INSTEAD:
def fixed_countdown(n):
    if n <= 0:  # ← Added base case
        return "Done!"
    print(f"Counting: {n}")
    return fixed_countdown(n - 1)

2. Iteration Would Be Clearer

Symptom: You're using recursion for simple linear processing.

# DON'T increase limit for this:
def sum_to_n_recursive(n):
    if n <= 0:
        return 0
    return n + sum_to_n_recursive(n - 1)

# sys.setrecursionlimit(100000)  # Bad idea
# sum_to_n_recursive(50000)  # Slow and unnecessary

# USE ITERATION INSTEAD:
def sum_to_n_iterative(n):
    total = 0
    for i in range(1, n + 1):
        total += i
    return total

# Or even better, use the formula:
def sum_to_n_formula(n):
    return n * (n + 1) // 2

# All three give the same result, but formula is O(1):
print(sum_to_n_iterative(50000))  # Fast
print(sum_to_n_formula(50000))    # Instant

3. You're Hitting Memory Limits

Symptom: Even with increased limit, you get MemoryError or system crashes.

Each recursive call consumes stack space (typically 1-8 KB per frame). Deep recursion can exhaust available memory:

Python Output (when memory is exhausted):

MemoryError: Stack overflow

Or worse, the system may kill the process without a Python exception.

How to Safely Increase the Limit

Step 1: Calculate Required Depth

Determine the maximum depth your algorithm needs:

def calculate_tree_depth(node):
    """Calculate maximum depth of a tree."""
    if node is None:
        return 0
    if not node.children:
        return 1
    return 1 + max(calculate_tree_depth(child) for child in node.children)

# Measure your data first
max_depth = calculate_tree_depth(root_node)
print(f"Tree depth: {max_depth}")

# Set limit with safety margin (2x is common)
required_limit = max_depth * 2
sys.setrecursionlimit(required_limit)

Step 2: Add Safety Margin

Always set the limit higher than your calculated need:

import sys

def set_safe_recursion_limit(expected_depth, safety_factor=2.0):
    """Set recursion limit with safety margin."""
    current_limit = sys.getrecursionlimit()
    required_limit = int(expected_depth * safety_factor)

    if required_limit > current_limit:
        # Add extra buffer for Python's internal calls
        new_limit = required_limit + 100
        sys.setrecursionlimit(new_limit)
        print(f"Recursion limit increased: {current_limit}{new_limit}")
    else:
        print(f"Current limit ({current_limit}) is sufficient")

# Usage
set_safe_recursion_limit(expected_depth=800)

Step 3: Wrap in Try-Except

Always handle potential RecursionError:

import sys

def process_deep_structure(data, max_depth=1000):
    """Process data with recursion limit protection."""
    original_limit = sys.getrecursionlimit()

    try:
        # Set higher limit temporarily
        sys.setrecursionlimit(max_depth + 100)
        result = recursive_processor(data)
        return result
    except RecursionError:
        print(f"Data too deep (exceeds {max_depth} levels)")
        print("Consider using iterative approach or increasing max_depth")
        return None
    finally:
        # Always restore original limit
        sys.setrecursionlimit(original_limit)

def recursive_processor(data):
    """Your recursive algorithm here."""
    # ... implementation ...
    pass

Platform Considerations

Different platforms have different stack size limits:

Platform Typical Stack Size Safe Recursion Limit
Linux (default) 8 MB ~10,000
macOS 8 MB ~10,000
Windows 1 MB ~1,000
Python (default) N/A 1,000

Key insight: Python's default limit of 1,000 is chosen to work safely across all platforms.

Checking Available Stack Space

import sys
import resource

def get_stack_info():
    """Get current stack size limits (Unix-like systems only)."""
    try:
        soft, hard = resource.getrlimit(resource.RLIMIT_STACK)
        print(f"Stack size limit: {soft / (1024**2):.1f} MB (soft), {hard / (1024**2):.1f} MB (hard)")

        recursion_limit = sys.getrecursionlimit()
        print(f"Python recursion limit: {recursion_limit}")

        # Rough estimate of safe limit
        bytes_per_frame = 8192  # Conservative estimate
        safe_limit = int((soft * 0.8) / bytes_per_frame)
        print(f"Estimated safe recursion limit: {safe_limit}")
    except (AttributeError, ValueError):
        print("Stack size information not available on this platform")

get_stack_info()

Output (on typical Linux system):

Stack size limit: 8.0 MB (soft), unlimited MB (hard)
Python recursion limit: 1000
Estimated safe recursion limit: 838

Best Practices Summary

DO: - ✓ Measure your actual recursion depth needs - ✓ Set limit temporarily and restore it - ✓ Add safety margin (2x expected depth) - ✓ Wrap in try-except for RecursionError - ✓ Document why you increased the limit - ✓ Test on target platform

DON'T: - ✗ Set arbitrarily high limits (e.g., 1,000,000) - ✗ Increase limit to "fix" infinite recursion bugs - ✗ Forget to restore original limit - ✗ Assume same limit works on all platforms - ✗ Use recursion when iteration is clearer

Alternatives to Deep Recursion

Alternatives to Deep Recursion

When recursion depth becomes a problem, several alternative approaches can solve the same problems without deep call stacks.

Alternative 1: Convert to Iteration

The most straightforward alternative is converting recursive logic to iterative logic using explicit loops.

Pattern: Linear Recursion → While Loop

Reference Implementation: Let's start with a recursive function that processes a linked list:

class Node:
    def __init__(self, value, next=None):
        self.value = value
        self.next = next

def sum_list_recursive(node):
    """Sum all values in a linked list recursively."""
    if node is None:
        return 0
    return node.value + sum_list_recursive(node.next)

# Create a linked list: 1 -> 2 -> 3 -> 4 -> 5
head = Node(1, Node(2, Node(3, Node(4, Node(5)))))

print(sum_list_recursive(head))  # Output: 15

Current limitation: This works fine for short lists, but what happens with a very long list?

# Create a long linked list (2000 nodes)
head = Node(1)
current = head
for i in range(2, 2001):
    current.next = Node(i)
    current = current.next

try:
    result = sum_list_recursive(head)
    print(f"Sum: {result}")
except RecursionError as e:
    print(f"RecursionError: {e}")

Python Output:

RecursionError: maximum recursion depth exceeded in comparison

Diagnostic Analysis: Understanding the Failure

The complete execution trace:

The function successfully processed the first ~1000 nodes, then crashed when trying to make the 1001st recursive call.

Let's parse this section by section:

  1. The error message: RecursionError: maximum recursion depth exceeded in comparison
  2. What this tells us: Python hit the recursion limit while comparing values (in the if node is None check)

  3. The call stack depth:

  4. Current depth: ~1000 (the limit)
  5. Expected depth: 2000 (one call per node)
  6. Key observation: Each node requires one stack frame, so a 2000-node list needs 2000 frames

  7. The recursive call pattern:

  8. Expected: sum_list(node₁) → sum_list(node₂) → ... → sum_list(node₂₀₀₀) → sum_list(None) → return
  9. Actual: sum_list(node₁) → sum_list(node₂) → ... → sum_list(node₁₀₀₀) → CRASH
  10. Key difference: Linear recursion depth equals list length

Root cause identified: Linear recursion on long sequences exceeds Python's stack limit.

Why the current approach can't solve this: Even if we increase the recursion limit, we're wasting stack space on a problem that doesn't need recursion.

What we need: An iterative approach that uses constant stack space.

Iteration 1: Converting to Iterative Form

After (Iterative):

def sum_list_iterative(node):
    """Sum all values in a linked list iteratively."""
    total = 0
    current = node
    while current is not None:
        total += current.value
        current = current.next
    return total

# Test with the long list
head = Node(1)
current = head
for i in range(2, 2001):
    current.next = Node(i)
    current = current.next

result = sum_list_iterative(head)
print(f"Sum: {result}")  # Output: 2001000

Improvement: Now handles lists of any length without recursion depth issues.

Complexity Analysis:

Recursive version: - Time Complexity: O(n) - Space Complexity: O(n) - call stack depth equals list length

Iterative version: - Time Complexity: O(n) - Space Complexity: O(1) - only uses a few variables

When to Apply This Solution: - What it optimizes for: Constant space usage, no recursion limit concerns - What it sacrifices: The elegance of recursive thinking - When to choose this approach: Linear processing of sequences, especially long ones - When to avoid this approach: When recursion naturally expresses the problem structure (trees, divide-and-conquer)

Pattern: Tree Recursion → Stack-Based Iteration

Reference Implementation: Let's process a binary tree recursively:

class TreeNode:
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

def sum_tree_recursive(node):
    """Sum all values in a binary tree recursively."""
    if node is None:
        return 0
    return node.value + sum_tree_recursive(node.left) + sum_tree_recursive(node.right)

# Create a small tree
#       1
#      / \
#     2   3
#    / \
#   4   5
root = TreeNode(1,
    TreeNode(2, TreeNode(4), TreeNode(5)),
    TreeNode(3)
)

print(sum_tree_recursive(root))  # Output: 15

Current limitation: This works for balanced trees, but what about a very deep, unbalanced tree?

# Create a very deep, unbalanced tree (2000 levels deep)
root = TreeNode(1)
current = root
for i in range(2, 2001):
    current.left = TreeNode(i)
    current = current.left

try:
    result = sum_tree_recursive(root)
    print(f"Sum: {result}")
except RecursionError as e:
    print(f"RecursionError: {e}")

Python Output:

RecursionError: maximum recursion depth exceeded

Diagnostic Analysis: Understanding the Failure

Root cause identified: Tree depth exceeds recursion limit.

What we need: An iterative approach using an explicit stack to simulate the call stack.

Iteration 1: Converting to Stack-Based Iteration

After (Stack-based iterative):

def sum_tree_iterative(node):
    """Sum all values in a binary tree using explicit stack."""
    if node is None:
        return 0

    total = 0
    stack = [node]  # Explicit stack replaces call stack

    while stack:
        current = stack.pop()
        total += current.value

        # Add children to stack (right first, so left is processed first)
        if current.right:
            stack.append(current.right)
        if current.left:
            stack.append(current.left)

    return total

# Test with the deep tree
root = TreeNode(1)
current = root
for i in range(2, 2001):
    current.left = TreeNode(i)
    current = current.left

result = sum_tree_iterative(root)
print(f"Sum: {result}")  # Output: 2001000

Improvement: Now handles trees of any depth without recursion depth issues.

Complexity Analysis:

Recursive version: - Time Complexity: O(n) - Space Complexity: O(h) where h is tree height - call stack depth equals tree height

Iterative version: - Time Complexity: O(n) - Space Complexity: O(h) - explicit stack size equals tree height

Key insight: Both use O(h) space, but the iterative version uses heap memory (which is much larger) instead of stack memory (which is limited).

When to Apply This Solution: - What it optimizes for: Avoiding recursion limit, using heap instead of stack - What it sacrifices: Code clarity (explicit stack management is more verbose) - When to choose this approach: Deep trees, graphs, or when recursion limit is a concern - When to avoid this approach: Shallow structures where recursion is clearer

Alternative 2: Tail Recursion with Trampolining

Some recursive functions can be converted to tail-recursive form, then executed using a "trampoline" that eliminates stack growth.

Understanding Tail Recursion

A function is tail-recursive if the recursive call is the last operation (nothing happens after the recursive call returns).

Not tail-recursive:

def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n - 1)  # ← Multiplication happens AFTER recursive call

Tail-recursive (using accumulator):

def factorial_tail(n, accumulator=1):
    if n <= 1:
        return accumulator
    return factorial_tail(n - 1, n * accumulator)  # ← Recursive call is last operation

Problem: Python doesn't optimize tail calls, so this still uses O(n) stack space.

Solution: Use a trampoline to convert tail recursion to iteration.

The Trampoline Pattern

Reference Implementation: Let's start with a tail-recursive factorial:

def factorial_tail(n, accumulator=1):
    """Tail-recursive factorial."""
    if n <= 1:
        return accumulator
    return factorial_tail(n - 1, n * accumulator)

# This still hits recursion limit
try:
    result = factorial_tail(2000)
    print(f"Factorial: {result}")
except RecursionError as e:
    print(f"RecursionError: {e}")

Python Output:

RecursionError: maximum recursion depth exceeded in comparison

Iteration 1: Adding Trampoline

After (Trampolined):

class Thunk:
    """Represents a delayed computation."""
    def __init__(self, func, *args, **kwargs):
        self.func = func
        self.args = args
        self.kwargs = kwargs

    def __call__(self):
        return self.func(*self.args, **self.kwargs)

def trampoline(thunk):
    """Execute a chain of thunks iteratively."""
    result = thunk
    while isinstance(result, Thunk):
        result = result()
    return result

def factorial_trampolined(n, accumulator=1):
    """Tail-recursive factorial that returns thunks."""
    if n <= 1:
        return accumulator
    # Return a thunk instead of calling recursively
    return Thunk(factorial_trampolined, n - 1, n * accumulator)

# Now this works for any depth
result = trampoline(factorial_trampolined(2000))
print(f"Factorial computed successfully (very large number)")
print(f"Number of digits: {len(str(result))}")

Output:

Factorial computed successfully (very large number)
Number of digits: 5736

Improvement: Now handles any recursion depth by converting recursive calls to iterative loop.

How it works:

  1. Instead of calling itself, the function returns a Thunk (delayed computation)
  2. The trampoline function executes thunks in a loop
  3. Each iteration processes one "recursive call" without using stack space

When to Apply This Solution: - What it optimizes for: Eliminating recursion depth limits for tail-recursive functions - What it sacrifices: Code complexity, slight performance overhead - When to choose this approach: Tail-recursive algorithms that need to handle large inputs - When to avoid this approach: When direct iteration is simpler, or recursion isn't tail-recursive

Alternative 3: Generator-Based Recursion

Generators can simulate recursion without building up the call stack.

Reference Implementation: Let's traverse a tree recursively:

def traverse_tree_recursive(node):
    """Yield all values in a tree recursively."""
    if node is None:
        return
    yield node.value
    yield from traverse_tree_recursive(node.left)
    yield from traverse_tree_recursive(node.right)

# Small tree works fine
root = TreeNode(1,
    TreeNode(2, TreeNode(4), TreeNode(5)),
    TreeNode(3)
)

print(list(traverse_tree_recursive(root)))  # Output: [1, 2, 4, 5, 3]

Current limitation: Deep trees still hit recursion limit.

# Deep tree fails
root = TreeNode(1)
current = root
for i in range(2, 2001):
    current.left = TreeNode(i)
    current = current.left

try:
    result = list(traverse_tree_recursive(root))
    print(f"Traversed {len(result)} nodes")
except RecursionError as e:
    print(f"RecursionError: {e}")

Python Output:

RecursionError: maximum recursion depth exceeded

Iteration 1: Converting to Iterative Generator

After (Iterative generator):

def traverse_tree_iterative(node):
    """Yield all values in a tree using explicit stack."""
    if node is None:
        return

    stack = [node]
    while stack:
        current = stack.pop()
        yield current.value

        # Add children to stack (right first for left-to-right traversal)
        if current.right:
            stack.append(current.right)
        if current.left:
            stack.append(current.left)

# Deep tree now works
root = TreeNode(1)
current = root
for i in range(2, 2001):
    current.left = TreeNode(i)
    current = current.left

result = list(traverse_tree_iterative(root))
print(f"Traversed {len(result)} nodes")  # Output: Traversed 2000 nodes

Improvement: Generator pattern with explicit stack avoids recursion depth issues.

When to Apply This Solution: - What it optimizes for: Memory-efficient iteration over large structures - What it sacrifices: Slightly more complex than recursive generators - When to choose this approach: When you need to iterate over deep structures lazily - When to avoid this approach: When you need all results at once (use list-building iteration instead)

Alternative 4: Divide-and-Conquer with Iteration

Some divide-and-conquer algorithms can be implemented iteratively.

Reference Implementation: Merge sort recursively:

def merge_sort_recursive(arr):
    """Sort array using recursive merge sort."""
    if len(arr) <= 1:
        return arr

    mid = len(arr) // 2
    left = merge_sort_recursive(arr[:mid])
    right = merge_sort_recursive(arr[mid:])

    return merge(left, right)

def merge(left, right):
    """Merge two sorted arrays."""
    result = []
    i = j = 0

    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1

    result.extend(left[i:])
    result.extend(right[j:])
    return result

# Small array works
arr = [64, 34, 25, 12, 22, 11, 90]
print(merge_sort_recursive(arr))  # Output: [11, 12, 22, 25, 34, 64, 90]

Current limitation: Very large arrays can hit recursion limit.

The recursion depth for merge sort is O(log n), so it takes a very large array to hit the limit:

# This would need an array of size 2^1000 to hit the limit
# But let's show the iterative alternative anyway

Iteration 1: Bottom-Up Merge Sort

After (Iterative merge sort):

def merge_sort_iterative(arr):
    """Sort array using iterative (bottom-up) merge sort."""
    if len(arr) <= 1:
        return arr

    # Make a copy to avoid modifying original
    arr = arr.copy()
    n = len(arr)

    # Start with subarrays of size 1, double each iteration
    size = 1
    while size < n:
        # Merge subarrays of current size
        for start in range(0, n, size * 2):
            mid = min(start + size, n)
            end = min(start + size * 2, n)

            # Merge arr[start:mid] and arr[mid:end]
            left = arr[start:mid]
            right = arr[mid:end]
            merged = merge(left, right)
            arr[start:end] = merged

        size *= 2

    return arr

# Test with large array
import random
large_arr = [random.randint(1, 1000) for _ in range(10000)]
sorted_arr = merge_sort_iterative(large_arr)
print(f"Sorted {len(sorted_arr)} elements")
print(f"First 10: {sorted_arr[:10]}")
print(f"Last 10: {sorted_arr[-10:]}")

Output:

Sorted 10000 elements
First 10: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Last 10: [1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000]

Improvement: Bottom-up approach eliminates recursion entirely.

Complexity Analysis:

Recursive merge sort: - Time Complexity: O(n log n) - Space Complexity: O(n) for temporary arrays + O(log n) for call stack

Iterative merge sort: - Time Complexity: O(n log n) - Space Complexity: O(n) for temporary arrays only

When to Apply This Solution: - What it optimizes for: Eliminating call stack overhead - What it sacrifices: The intuitive divide-and-conquer structure - When to choose this approach: When recursion depth might be a concern (rare for merge sort) - When to avoid this approach: When recursive version is clearer and depth isn't an issue

Decision Framework: Which Alternative When?

Scenario Best Alternative Why
Linear recursion (lists, sequences) Convert to iteration Simplest, most efficient
Tree/graph traversal Stack-based iteration Natural fit, uses heap not stack
Tail-recursive algorithm Trampoline Preserves recursive structure
Lazy evaluation needed Generator-based Memory-efficient streaming
Divide-and-conquer Bottom-up iteration Eliminates recursion overhead
Complex backtracking Keep recursion Recursion is clearest, depth usually manageable

Lessons Learned

  1. Recursion depth is a real constraint: Python's 1000-call limit is hit more often than you might think
  2. Iteration is often simpler: For linear processing, don't use recursion just because you can
  3. Stack-based iteration is powerful: Simulates recursion using heap memory (much larger than stack)
  4. Tail recursion needs trampolining: Python doesn't optimize tail calls, so use trampolines for deep tail recursion
  5. Generators are underrated: They provide lazy evaluation and can avoid building large intermediate structures
  6. Choose based on problem structure: Trees and graphs benefit from recursion; sequences often don't

The functools.lru_cache Decorator

The functools.lru_cache Decorator

Python's functools.lru_cache is a powerful decorator that automatically memoizes function results, dramatically improving performance for recursive functions with overlapping subproblems.

What is Memoization?

Memoization is an optimization technique that stores the results of expensive function calls and returns the cached result when the same inputs occur again.

Reference Implementation: Let's start with the classic example—computing Fibonacci numbers recursively:

def fibonacci(n):
    """Compute the nth Fibonacci number recursively."""
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

# Test with small inputs
print(f"fibonacci(5) = {fibonacci(5)}")   # Output: 5
print(f"fibonacci(10) = {fibonacci(10)}")  # Output: 55

Output:

fibonacci(5) = 5
fibonacci(10) = 55

Current limitation: This works for small inputs, but what happens with larger values?

import time

# Measure performance
start = time.time()
result = fibonacci(35)
elapsed = time.time() - start

print(f"fibonacci(35) = {result}")
print(f"Time: {elapsed:.2f} seconds")

Output:

fibonacci(35) = 9227465
Time: 3.47 seconds

Why so slow? Let's visualize the recursion tree for fibonacci(5):

                    fib(5)
                   /      \
              fib(4)        fib(3)
             /      \       /     \
        fib(3)    fib(2) fib(2)  fib(1)
       /     \    /    \  /    \
   fib(2) fib(1) fib(1) fib(0) fib(1) fib(0)
   /    \
fib(1) fib(0)

Key observation: fib(3) is computed twice, fib(2) is computed three times, fib(1) is computed five times!

Diagnostic Analysis: Understanding the Inefficiency

The complete execution trace (for fibonacci(5)):

fibonacci(5)
  → fibonacci(4)
    → fibonacci(3)
      → fibonacci(2)
        → fibonacci(1) = 1
        → fibonacci(0) = 0
      → fibonacci(1) = 1
    → fibonacci(2)
      → fibonacci(1) = 1
      → fibonacci(0) = 0
  → fibonacci(3)
    → fibonacci(2)
      → fibonacci(1) = 1
      → fibonacci(0) = 0
    → fibonacci(1) = 1

Let's parse this section by section:

  1. The call pattern: Each call spawns two more calls (except base cases)
  2. What this tells us: Exponential growth in number of calls

  3. Repeated computations:

  4. fibonacci(3) computed: 2 times
  5. fibonacci(2) computed: 3 times
  6. fibonacci(1) computed: 5 times
  7. fibonacci(0) computed: 3 times
  8. Critical insight: We're recomputing the same values over and over

  9. Total function calls:

  10. For fibonacci(5): 15 calls
  11. For fibonacci(10): 177 calls
  12. For fibonacci(35): 29,860,703 calls!
  13. Key observation: Grows exponentially with n

Root cause identified: Overlapping subproblems—the same Fibonacci numbers are computed many times.

Why the current approach can't scale: Time complexity is O(2^n), which becomes impractical for n > 40.

What we need: A way to remember previously computed results.

Iteration 1: Manual Memoization

Before (No memoization):

def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

After (Manual memoization):

def fibonacci_memo(n, cache=None):
    """Fibonacci with manual memoization."""
    if cache is None:
        cache = {}

    # Check if result is already cached
    if n in cache:
        return cache[n]

    # Base cases
    if n <= 1:
        return n

    # Compute and cache result
    result = fibonacci_memo(n - 1, cache) + fibonacci_memo(n - 2, cache)
    cache[n] = result
    return result

# Test performance
import time

start = time.time()
result = fibonacci_memo(35)
elapsed = time.time() - start

print(f"fibonacci_memo(35) = {result}")
print(f"Time: {elapsed:.4f} seconds")

Output:

fibonacci_memo(35) = 9227465
Time: 0.0001 seconds

Improvement: From 3.47 seconds to 0.0001 seconds—a 34,700x speedup!

How it works:

  1. Before computing fibonacci(n), check if it's in the cache
  2. If yes, return the cached value immediately
  3. If no, compute it, store in cache, then return

Complexity Analysis:

Without memoization: - Time Complexity: O(2^n) - exponential - Space Complexity: O(n) - call stack depth

With memoization: - Time Complexity: O(n) - each value computed once - Space Complexity: O(n) - cache size + call stack depth

Iteration 2: Using functools.lru_cache

Problem with manual memoization: Requires passing cache parameter, clutters function signature.

Solution: Use Python's built-in lru_cache decorator.

After (Using lru_cache):

from functools import lru_cache

@lru_cache(maxsize=None)
def fibonacci_cached(n):
    """Fibonacci with automatic memoization."""
    if n <= 1:
        return n
    return fibonacci_cached(n - 1) + fibonacci_cached(n - 2)

# Test performance
import time

start = time.time()
result = fibonacci_cached(35)
elapsed = time.time() - start

print(f"fibonacci_cached(35) = {result}")
print(f"Time: {elapsed:.4f} seconds")

# Check cache statistics
print(f"Cache info: {fibonacci_cached.cache_info()}")

Output:

fibonacci_cached(35) = 9227465
Time: 0.0001 seconds
Cache info: CacheInfo(hits=33, misses=36, maxsize=None, currsize=36)

Improvement: Same performance as manual memoization, but with cleaner code!

Cache statistics explained: - hits=33: Number of times a cached result was returned - misses=36: Number of times the function had to compute a new result - maxsize=None: Unlimited cache size - currsize=36: Number of unique results currently cached

When to Apply This Solution: - What it optimizes for: Eliminating redundant computation - What it sacrifices: Memory (stores all results) - When to choose this approach: Functions with overlapping subproblems (DP problems) - When to avoid this approach: Functions with unique inputs each time, or when memory is constrained

Understanding lru_cache Parameters

maxsize: Controlling Cache Size

The maxsize parameter controls how many results to cache:

from functools import lru_cache

# Unlimited cache (stores all results)
@lru_cache(maxsize=None)
def fibonacci_unlimited(n):
    if n <= 1:
        return n
    return fibonacci_unlimited(n - 1) + fibonacci_unlimited(n - 2)

# Limited cache (stores only 128 most recent results)
@lru_cache(maxsize=128)
def fibonacci_limited(n):
    if n <= 1:
        return n
    return fibonacci_limited(n - 1) + fibonacci_limited(n - 2)

# Test both
print(fibonacci_unlimited(100))
print(fibonacci_unlimited.cache_info())

print(fibonacci_limited(100))
print(fibonacci_limited.cache_info())

Output:

354224848179261915075
CacheInfo(hits=98, misses=101, maxsize=None, currsize=101)
354224848179261915075
CacheInfo(hits=98, misses=101, maxsize=128, currsize=101)

Key differences:

maxsize Behavior Use Case
None Unlimited cache When you need all results cached
128 (default) LRU eviction when full Balanced memory/performance
0 No caching Disables caching (for testing)
Power of 2 Most efficient Optimizes hash table performance

LRU (Least Recently Used): When cache is full, the least recently accessed result is evicted to make room for new results.

typed: Type-Sensitive Caching

The typed parameter controls whether different argument types are cached separately:

from functools import lru_cache

@lru_cache(maxsize=None, typed=False)
def add_untyped(a, b):
    """Cache doesn't distinguish between int and float."""
    print(f"Computing {a} + {b}")
    return a + b

@lru_cache(maxsize=None, typed=True)
def add_typed(a, b):
    """Cache distinguishes between int and float."""
    print(f"Computing {a} + {b}")
    return a + b

# Test untyped
print(add_untyped(1, 2))      # Computes
print(add_untyped(1.0, 2.0))  # Uses cached result (treats 1.0 as 1)

print()

# Test typed
print(add_typed(1, 2))      # Computes
print(add_typed(1.0, 2.0))  # Computes again (different types)

Output:

Computing 1 + 2
3
3

Computing 1 + 2
3
Computing 1.0 + 2.0
3.0

When to use typed=True: - When function behavior differs based on argument types - When you need precise type-based caching - Default is typed=False for most use cases

Cache Management Methods

lru_cache provides methods to inspect and manage the cache:

from functools import lru_cache

@lru_cache(maxsize=128)
def expensive_function(n):
    """Simulate expensive computation."""
    return n ** 2

# Populate cache
for i in range(10):
    expensive_function(i)

# Inspect cache
print(expensive_function.cache_info())
# Output: CacheInfo(hits=0, misses=10, maxsize=128, currsize=10)

# Clear cache
expensive_function.cache_clear()
print(expensive_function.cache_info())
# Output: CacheInfo(hits=0, misses=0, maxsize=128, currsize=0)

# Access underlying function (bypasses cache)
result = expensive_function.__wrapped__(5)
print(f"Direct call result: {result}")
print(expensive_function.cache_info())
# Output: Direct call result: 25
#         CacheInfo(hits=0, misses=0, maxsize=128, currsize=0)

Available methods:

Method Purpose
cache_info() Returns cache statistics
cache_clear() Clears all cached results
__wrapped__ Access original function (bypasses cache)

Common Use Cases

Use Case 1: Dynamic Programming Problems

Problem: Compute number of ways to climb stairs (can take 1 or 2 steps at a time).

from functools import lru_cache

@lru_cache(maxsize=None)
def climb_stairs(n):
    """Number of ways to climb n stairs."""
    if n <= 2:
        return n
    return climb_stairs(n - 1) + climb_stairs(n - 2)

# Without cache, this would take forever
print(f"Ways to climb 50 stairs: {climb_stairs(50)}")
print(f"Cache info: {climb_stairs.cache_info()}")

Output:

Ways to climb 50 stairs: 20365011074
Cache info: CacheInfo(hits=48, misses=51, maxsize=None, currsize=51)

Use Case 2: Expensive Recursive Computations

Problem: Calculate factorial with memoization.

from functools import lru_cache

@lru_cache(maxsize=None)
def factorial(n):
    """Compute factorial with memoization."""
    if n <= 1:
        return 1
    return n * factorial(n - 1)

# Compute large factorials efficiently
print(f"100! has {len(str(factorial(100)))} digits")
print(f"Cache info: {factorial.cache_info()}")

# Subsequent calls are instant
print(f"50! has {len(str(factorial(50)))} digits")
print(f"Cache info: {factorial.cache_info()}")

Output:

100! has 158 digits
Cache info: CacheInfo(hits=0, misses=100, maxsize=None, currsize=100)
50! has 65 digits
Cache info: CacheInfo(hits=50, misses=100, maxsize=None, currsize=100)

Key insight: Computing factorial(50) after factorial(100) requires no new computation—all results are cached!

Use Case 3: Tree/Graph Traversal with Repeated Nodes

Problem: Count paths in a graph with cycles.

from functools import lru_cache

# Graph represented as adjacency list
graph = {
    'A': ['B', 'C'],
    'B': ['D'],
    'C': ['D'],
    'D': ['E'],
    'E': []
}

@lru_cache(maxsize=None)
def count_paths(node, target):
    """Count all paths from node to target."""
    if node == target:
        return 1
    if node not in graph:
        return 0

    total = 0
    for neighbor in graph[node]:
        total += count_paths(neighbor, target)
    return total

# Count paths from A to E
result = count_paths('A', 'E')
print(f"Paths from A to E: {result}")
print(f"Cache info: {count_paths.cache_info()}")

Output:

Paths from A to E: 2
Cache info: CacheInfo(hits=2, misses=5, maxsize=None, currsize=5)

When NOT to Use lru_cache

Case 1: Functions with Unhashable Arguments

lru_cache requires all arguments to be hashable (immutable). Lists and dicts won't work:

from functools import lru_cache

@lru_cache(maxsize=None)
def process_list(items):
    """This will fail!"""
    return sum(items)

try:
    result = process_list([1, 2, 3])
except TypeError as e:
    print(f"TypeError: {e}")

Output:

TypeError: unhashable type: 'list'

Solution: Convert to tuple:

@lru_cache(maxsize=None)
def process_list(items):
    """Convert list to tuple for caching."""
    items = tuple(items)  # Make hashable
    return sum(items)

result = process_list([1, 2, 3])
print(f"Result: {result}")

Or use a wrapper:

@lru_cache(maxsize=None)
def _process_tuple(items_tuple):
    """Internal cached function."""
    return sum(items_tuple)

def process_list(items):
    """Public API that accepts lists."""
    return _process_tuple(tuple(items))

result = process_list([1, 2, 3])
print(f"Result: {result}")

Case 2: Functions with Side Effects

Caching functions with side effects can lead to unexpected behavior:

from functools import lru_cache

@lru_cache(maxsize=None)
def fetch_data(url):
    """BAD: Caching a function with side effects."""
    print(f"Fetching {url}...")
    # Imagine this makes an HTTP request
    return f"Data from {url}"

# First call executes
print(fetch_data("http://example.com"))

# Second call returns cached result (no fetch!)
print(fetch_data("http://example.com"))

Output:

Fetching http://example.com...
Data from http://example.com
Data from http://example.com

Problem: If the data at the URL changes, the cached result becomes stale.

Solution: Don't cache functions with side effects, or use explicit cache invalidation.

Case 3: Functions with Unique Inputs

If every call has unique arguments, caching provides no benefit:

from functools import lru_cache
import time

@lru_cache(maxsize=128)
def process_unique(timestamp):
    """Every call has a unique timestamp."""
    return timestamp ** 2

# Every call is a cache miss
for i in range(200):
    process_unique(time.time())

# Cache is full but never had a hit
print(process_unique.cache_info())

Output:

CacheInfo(hits=0, misses=200, maxsize=128, currsize=128)

Key insight: Cache is full (128 entries) but had zero hits—pure overhead!

Performance Considerations

Memory Usage

Each cached result consumes memory. For large return values or many unique inputs, this can be significant:

from functools import lru_cache
import sys

@lru_cache(maxsize=None)
def generate_large_list(n):
    """Generate a large list."""
    return list(range(n))

# Cache 100 large lists
for i in range(100):
    generate_large_list(i * 1000)

# Check memory usage
cache_info = generate_large_list.cache_info()
print(f"Cached results: {cache_info.currsize}")
print(f"Approximate memory: {cache_info.currsize * 1000 * 8 / 1024 / 1024:.2f} MB")

Output:

Cached results: 100
Approximate memory: 0.76 MB

Guideline: Use maxsize to limit memory usage when caching large objects.

Thread Safety

lru_cache is thread-safe, but this adds overhead:

from functools import lru_cache
import threading

@lru_cache(maxsize=128)
def thread_safe_function(n):
    """Safe to call from multiple threads."""
    return n ** 2

# Multiple threads can safely call this
threads = []
for i in range(10):
    t = threading.Thread(target=lambda: thread_safe_function(i))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(thread_safe_function.cache_info())

Key insight: Thread safety is built-in, but adds synchronization overhead.

Best Practices Summary

DO: - ✓ Use for recursive functions with overlapping subproblems - ✓ Use for expensive pure functions (no side effects) - ✓ Set maxsize to limit memory usage - ✓ Use cache_info() to verify cache effectiveness - ✓ Clear cache when data changes: func.cache_clear() - ✓ Convert mutable arguments to immutable (list → tuple)

DON'T: - ✗ Cache functions with side effects - ✗ Cache functions with unique inputs every time - ✗ Use with unhashable arguments (lists, dicts) - ✗ Forget about memory usage with unlimited cache - ✗ Cache functions that depend on global state

Comparison: Manual vs lru_cache

Aspect Manual Memoization lru_cache
Code complexity Higher (explicit cache management) Lower (decorator)
Flexibility Full control Limited to LRU policy
Thread safety Must implement yourself Built-in
Cache statistics Must implement yourself Built-in
Memory management Manual Automatic (with maxsize)
Performance Slightly faster (no decorator overhead) Slightly slower

Recommendation: Use lru_cache unless you need custom cache behavior.

Lessons Learned

  1. Memoization transforms exponential to linear: For problems with overlapping subproblems, caching can provide orders of magnitude speedup
  2. lru_cache is production-ready: Thread-safe, well-tested, and optimized
  3. Memory is the trade-off: Caching trades memory for speed—use maxsize to control this
  4. Not all recursion benefits: Only functions with repeated inputs benefit from caching
  5. Cache statistics are valuable: Use cache_info() to verify your cache is actually helping
  6. Immutability matters: Arguments must be hashable—convert lists to tuples when needed